DOLPHIN - 2012 - Annual activity report

DOLPHIN

DOLPHIN - 2012

Project-Team Dolphin

Members

Overall Objectives

Scientific Foundations

Application Domains

Software

ParadisEO

New Results

Bilateral Contracts and Grants with Industry

Bilateral Contracts with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: New Results

Design and implementation of performance or energy-aware parallel optimization algorithms

Problems in practice are nowadays becoming more and more complex and time-intensive and their resolution requires to harness more and more computational resources. In parallel, the recent advances in hardware architecture enable to provide such required tremendous computational power through massively multi-core and GPU infrastructures. Such huge amount of cores is often provided through heterogeneous single or multi-clusters. The exploitation of such infrastructures clearly poses two fundamental and conflictual issues which are two major challenging perspectives of the Dolphin project that have been investigated during the 2012 year: (1) Performance-aware issue: how to design, implement and validate efficient and effective algorithms for such target machines to solve large size combinatorial optimization problems? ; (2) Energy-aware issue: using a large amount of computational resources for the deployment of large scale algorithms is energy-consuming. Therefore, the second issue is how to deal with the performance issue with a minimized cost in terms of energy consumption?

To deal with these issues, we have proposed new approaches summarized in the following sections.

Design and implementation of performance-aware optimization algorithms

In order to allow one to solve large size combinatorial optimization problems, we have revisited the design and implementation of meta-heuristics and exat (B&B) algorithms for two major hardware platforms: heterogeneous multi and many-core clusters and computational grids including multiple clusters.

Multi-core GPU-based hybrid meta-heuristics - Participants: T-V. Luong, N. Melab and E-G. Talbi.

In [28] , we have revisited the design and implementation of respectively single-solution and population-based meta-heuristics for single-core CPU coupled with a GPU device. We have investigated and proposed a new guidline for combining multi-core and GPU computing for hybrid meta-heuristics. Efficient approaches have been proposed for CPU-GPU data transfer optimization and task repartition between the GPU device and the CPU cores. Extensive experiments have been performed on an 8-core CPU coupled with a GPU card using Ant colonnies combined with a local serach applied to the Quadratic Assignment Problem (QAP). The reported results show that the use of multi-core computing, in addition to GPU computing, provides a performance improvement of up to 75%.
GPU-accelerated Branch-and-Bound algorithms - Participants: I. Chakroun and N. Melab.

Branch-and-Bound (B&B) algorithms are based on an implicit enumeration of a dynamically built tree-based search space. Pruning tree nodes (sub-problems) is traditionally used as a powerful mechanism to reduce the size of the explored search space. Such mechanism requires to perform the bounding operation which consists in applying a lower bound function to the generated sub-problems. Preliminary experiments performed on the Flow-Shop scheduling problem (FSP) have shown that the bounding operation consumes over 98% of the execution time of the B&B algorithm. Therefore, we have investigated the use of GPU computing as a major complementary way to speed up the search. We have revisited the design and implementation of the parallel bounding model for FSP on GPU accelerators dealing with two major issues: (1) thread divergence caused by the highly irregular nature of the explored tree and the SIMD execution model of GPU ; (2) data access optimization required for mapping efficiently different data structures on the hierarchy of memories provided in the GPU device. In [14] , we have proposed a GPU-based parallel bounding model together with a data refactoring approach to deal with thread divergence. In [45] (an extended version submitted to the CCPE journal is being revised), we have proposed an efficient data optimization strategy based on a deep analysis of the complexity of the different data structures of the FSP lower bound algorithm in terms of memory size and access latency. The different proposed approaches for the two issues have been extensively experimented using and Nvidia Tesla C2050 GPU card. Compared to a CPU-based execution, accelerations up to more than $\times 100$ are achieved for large problem instances.
Peer-to-peer Branch-and-Bound algorithms - Participants: T-T. Vu, B. Derbel and N. Melab.

To deal with dynamic load balancing in large scale distributed systems, we have proposed in [50] to organize computing resources following a logical peer-to-peer overlay and to distribute the load according to the so-defined overlay. We have used a tree as a logical structure connecting distributed nodes and we balance the load according to the size of induced subtrees. We have conducted extensive experiments involving up to 1000 computing cores and provided a throughout analysis of different properties of our generic approach for two different applications, namely, the standard Unbalanced Tree Search and the more challenging parallel Branch-and-Bound algorithm. Substantial improvements are reported in comparison with the classical random work stealing and two finely tuned application specific strategies taken from the literature.

Design and implementation of energy-aware optimization algorithms

- Participants: Y. Kessaci, N. Melab and E-G. Talbi.

Cloud computing is an emerging computer science paradigm of distributed computing in which applications, data and infrastructures are proposed as a service that can be consumed in a ubiquitous, flexible and transparent way. Cloud computing brings with it such benefits via cloud managers which hide to the user some complex and challenging issues such as scheduling. However, the solutions to these issues provided in cloud managers are sometimes limited. For instance, the scheduling approach proposed in many cloud managers like OpenNebula is limited regarding the criteria taken into account. Energy consumption, which is highly critical for many applications such as High Performance Computing (HPC), is rarely considered. In [42] (selected for a special issue of FGCS journal) , we have addressed energy-aware scheduling of energy and time-consuming applications for cloud infrastructures. We have proposed a multi-start parallel local search heuristic for cloud managers (EMLS-ONC) with the focus put on OpenNebula. EMLS-ONC has been experimented using different (VMs) arrival scenarii and different hardware infrastructures. The results show that EMLS-ONC outperforms the scheduler provided in OpenNebula by a significant margin in terms of energy consumption and number of scheduled VMs.

Previous |

Home | Next next